A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

نویسندگان

  • Sergio M. Savaresi
  • Daniel L. Boley
چکیده

This paper deals with the problem of clustering a data set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K-means algorithm, and the recently proposed Principal Direction Divisive Partitioning (PDDP) algorithm. A comparison of the two algorithms is given, under the assumption that the data set is uniformly distributed within an ellipsoid. In particular, the dynamic behavior of the K-means iterative procedure is studied and discussed; for the 2-dimensional case a closed-form model is given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bisecting K-means and PDDP: A Comparative Analysis

This paper deals with the problem of clustering a data−set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K−means algorithm, and the recently proposed Principal Direction Divisive Partitioning (PDDP) algorithm. A comparison of the two algorithms is given, under the assumption that the data set is uniformly distributed ...

متن کامل

On the performance of bisecting K - means and PDDP * Sergio

problem is known as bisecting divisive clustering. Note that by recursively using a divisive bisecting clustering procedure, the dataset can be partitioned into any given number of clusters. Interestingly enough, the clusters so-obtained are structured as a hierarchical binary tree (or a binary taxonomy). This is the reason why the bisecting divisive approach is very attractive in many applicat...

متن کامل

Choosing the cluster to split in bisecting divisive clustering algorithms

This paper deals with the problem of clustering a data-set. In particular, the bisecting divisive approach is here considered. This approach can be naturally divided into two sub-problems: the problem of choosing which cluster must be divided, and the problem of splitting the selected cluster. The focus here is on the first problem. The contribution of this work is to propose a new simple techn...

متن کامل

On the performance of bisecting K-means and PDDP

The problem this paper focuses on is the unsupervised clustering of a data-set. The dataset is given by the matrix [ ] N p N x x x M × R ∈ = ,..., , 2 1 , where each column of M, p i x R ∈ , is a single data-point. This is one of the more basic and common problems in fields like pattern analysis, data mining, document retrieval, image segmentation, decision making, etc. ([12, 13]). The specific...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2004